142 research outputs found

    A structural classification of protein-protein interactions for detection of convergently evolved motifs and for prediction of protein binding sites on sequence level

    Get PDF
    BACKGROUND: A long-standing challenge in the post-genomic era of Bioinformatics is the prediction of protein-protein interactions, and ultimately the prediction of protein functions. The problem is intrinsically harder, when only amino acid sequences are available, but a solution is more universally applicable. So far, the problem of uncovering protein-protein interactions has been addressed in a variety of ways, both experimentally and computationally. MOTIVATION: The central problem is: How can protein complexes with solved threedimensional structure be utilized to identify and classify protein binding sites and how can knowledge be inferred from this classification such that protein interactions can be predicted for proteins without solved structure? The underlying hypothesis is that protein binding sites are often restricted to a small number of residues, which additionally often are well-conserved in order to maintain an interaction. Therefore, the signal-to-noise ratio in binding sites is expected to be higher than in other parts of the surface. This enables binding site detection in unknown proteins, when homology based annotation transfer fails. APPROACH: The problem is addressed by first investigating how geometrical aspects of domain-domain associations can lead to a rigorous structural classification of the multitude of protein interface types. The interface types are explored with respect to two aspects: First, how do interface types with one-sided homology reveal convergently evolved motifs? Second, how can sequential descriptors for local structural features be derived from the interface type classification? Then, the use of sequential representations for binding sites in order to predict protein interactions is investigated. The underlying algorithms are based on machine learning techniques, in particular Hidden Markov Models. RESULTS: This work includes a novel approach to a comprehensive geometrical classification of domain interfaces. Alternative structural domain associations are found for 40% of all family-family interactions. Evaluation of the classification algorithm on a hand-curated set of interfaces yielded a precision of 83% and a recall of 95%. For the first time, a systematic screen of convergently evolved motifs in 102.000 protein-protein interactions with structural information is derived. With respect to this dataset, all cases related to viral mimicry of human interface bindings are identified. Finally, a library of 740 motif descriptors for binding site recognition - encoded as Hidden Markov Models - is generated and cross-validated. Tests for the significance of motifs are provided. The usefulness of descriptors for protein-ligand binding sites is demonstrated for the case of "ATP-binding", where a precision of 89% is achieved, thus outperforming comparable motifs from PROSITE. In particular, a novel descriptor for a P-loop variant has been used to identify ATP-binding sites in 60 protein sequences that have not been annotated before by existing motif databases

    A Framework for Technology Forecasting and Visualization

    Get PDF
    This paper presents a novel framework for supporting the development of well-informed research policies and plans. The proposed methodology is based on the use of bibliometrics; i.e., analysis is conducted using information regarding trends and patterns of publication. Information thus obtained is analyzed to predict probable future developments in the technological fields being studied. While using bibliometric techniques to study science and technology is not a new idea, the proposed approach extends previous studies in a number of important ways. Firstly, instead of being purely exploratory, the focus of our research has been on developing techniques for detecting technologies that are in the early growth phase, characterized by a rapid increase in the number of relevant publications. Secondly, to increase the reliability of the forecasting effort, we propose the use of automatically generated keyword taxonomies, allowing the growth potentials of subordinate technologies to aggregated into the overall potential of larger technology categories. As a demonstration, a proof-of-concept implementation of each component of the framework is presented, and is used to study the domain of renewable energy technologies. Results from this analysis are presented and discussed

    Comparison of Generality Based Algorithm Variants for Automatic Taxonomy Generation

    Get PDF
    We compare a family of algorithms for the automatic generation of taxonomies by adapting the Heymannalgorithm in various ways. The core algorithm determines the generality of terms and iteratively inserts them in a growing taxonomy. Variants of the algorithm are created by altering the way and the frequency, generality of terms is calculated. We analyse the performance and the complexity of the variants combined with a systematic threshold evaluation on a set of seven manually created benchmark sets. As a result, betweenness centrality calculated on unweighted similarity graphs often performs best but requires threshold fine-tuning and is computationally more expensive than closeness centrality. Finally, we show how an entropy-based filter can lead to more precise taxonomies

    SCOPPI: a structural classification of proteinā€“protein interfaces

    Get PDF
    SCOPPI, the structural classification of proteinā€“protein interfaces, is a comprehensive database that classifies and annotates domain interactions derived from all known protein structures. SCOPPI applies SCOP domain definitions and a distance criterion to determine inter-domain interfaces. Using a novel method based on multiple sequence and structural alignments of SCOP families, SCOPPI presents a comprehensive geometrical classification of domain interfaces. Various interface characteristics such as number, type and position of interacting amino acids, conservation, interface size, and permanent or transient nature of the interaction are further provided. Proteins in SCOPPI are annotated with Gene Ontology terms, and the ontology can be used to quickly browse SCOPPI. Screenshots are available for every interface and its participating domains. Here, we describe contents and features of the web-based user interface as well as the underlying methods used to generate SCOPPI's data. In addition, we present a number of examples where SCOPPI becomes a useful tool to analyze viral mimicry of human interface binding sites, gene fusion events, conservation of interface residues and diversity of interface localizations. SCOPPI is available at

    A Unified Approach for Taxonomy-based Technology Forecasting

    Get PDF
    For decision makers and researchers working in a technical domain, understanding the state of their area of interest is of the highest importance. For this reason, we consider in this chapter, a novel framework for Web-based technology forecasting using bibliometrics (i.e. the analysis of information from trends and patterns of scientific publications). The proposed framework consists of a few conceptual stages based on a data acquisition process from bibliographic online repositories: extraction of domainrelevant keywords, the generation of taxonomy of the research field of interests and the development of early growth indicators which helps to find interesting technologies in their first phase of development. To provide a concrete application domain for developing and testing our tools, we conducted a case study in the field of renewable energy and in particular one of its subfields: Waste-to-Energy (W2E). The results on this particular research domain confirm the benefit of our approach

    Genetic diversity and low stratification of the population of the United Arab Emirates

    Get PDF
    Ā© Copyright Ā© 2020 Tay, Henschel, Daw Elbait and Al Safar. With high consanguinity rates on the Arabian Peninsula, it would not have been unexpected if the population of the United Arab Emirates (UAE) was shown to be relatively homogenous. However, this study of 1000 UAE nationals provided a contrasting perspective, one of a relatively heterogeneous population. Located at the apex of Europe, Asia, and Africa, the observed diversity could be explained by a plethora of migration patterns since the first Out-of-Africa movement. A strategy to explore the extent of genetic variation of the population of the UAE is presented. The first step involved a comprehensive population stratification study that was instructive for subsequent whole genome sequencing (WGS) of suitable representatives (which is described elsewhere). When these UAE data were compared to previous smaller studies from the region, the findings were consistent with a population that is a diverse and admixed group of people. However, rather than sharp and distinctive clusters, cluster analysis reveals low levels of stratification throughout the population. UAE emirates exhibit high within-Emirate-distance/among-Emirate distance ratios. Supervised admixture analysis showed a continuous gradient of ancestral populations, suggesting that admixture on the south eastern tip of the Arabian Peninsula occurred gradually. When visualized using a unique technique that combined admixture ratios and principal component analysis (PCA), unappreciated diversity was revealed while mitigating projection bias of conventional PCA. We observe low population stratification in the UAE in terms of homozygosity versus separation cluster coefficients. This holds for the UAE in a global context as well as for isolated cluster analysis of the Emirati birthplaces. However, the subtle clustering observed in the Emirates reflects geographic proximity and historic migration events. The analytical strategy used here highlights the complementary nature of data from genotype array and WGS for anthropological studies. Specifically, genotype array data were instructive to select representative subjects for WGS. Furthermore, from the 2.3 million allele frequencies obtained from genotype arrays, we identified 46,481 loci with allele frequencies that were significantly different with respect to other world populations. This comparison of allele frequencies facilitates variant prioritization in common diseases. In addition, these loci bear great potential as biomarkers in anthropological and forensic studies

    Myofascial Trigger Points in Children With Tension-Type Headache: A New Diagnostic and Therapeutic Option

    Get PDF
    The goal of this pilot study was to evaluate the effect of a trigger pointā€“specific physiotherapy on headache frequency, intensity, and duration in children with episodic or chronic tension-type headache. Patients were recruited from the special headache outpatient clinic. A total of 9 girls (mean age 13.1 years; range, 5-15 years) with the diagnosis of tension-type headache participated in the pilot study from May to September 2006 and received trigger pointā€“specific physiotherapy twice a week by a trained physiotherapist. After an average number of 6.5 therapeutic sessions, the headache frequency had been reduced by 67.7%, intensity by 74.3%, and duration by 77.3%. No side effects were noted during the treatment. These preliminary findings suggest a role for active trigger points in children with tension-type headache. Trigger pointā€“specific physiotherapy seems to be an effective therapy in these children. Further prospective and controlled studies in a larger cohort are warranted

    The Many Faces of Proteinā€“Protein Interactions: A Compendium of Interface Geometry

    Get PDF
    A systematic classification of proteinā€“protein interfaces is a valuable resource for understanding the principles of molecular recognition and for modelling protein complexes. Here, we present a classification of domain interfaces according to their geometry. Our new algorithm uses a hybrid approach of both sequential and structural features. The accuracy is evaluated on a hand-curated dataset of 416 interfaces. Our hybrid procedure achieves 83% precision and 95% recall, which improves the earlier sequence-based method by 5% on both terms. We classify virtually all domain interfaces of known structure, which results in nearly 6,000 distinct types of interfaces. In 40% of the cases, the interacting domain families associate in multiple orientations, suggesting that all the possible binding orientations need to be explored for modelling multidomain proteins and protein complexes. In general, hub proteins are shown to use distinct surface regions (multiple faces) for interactions with different partners. Our classification provides a convenient framework to query genuine gene fusion, which conserves binding orientation in both fused and separate forms. The result suggests that the binding orientations are not conserved in at least one-third of the gene fusion cases detected by a conventional sequence similarity search. We show that any evolutionary analysis on interfaces can be skewed by multiple binding orientations and multiple interaction partners. The taxonomic distribution of interface types suggests that ancient interfaces common to the three major kingdoms of life are enriched by symmetric homodimers. The classification results are online at http://www.scoppi.org

    A population-specific major allele reference genome from the United Arab Emirates population

    Get PDF
    The ethnic composition of the population of a country contributes to the uniqueness of each national DNA sequencing project and, ideally, individual reference genomes are required to reduce the confounding nature of ethnic bias. This work represents a representative Whole Genome Sequencing effort of an understudied population. Specifically, high coverage consensus sequences from 120 whole genomes and 33 whole exomes were used to construct the first ever population specific major allele reference genome for the United Arab Emirates (UAE). When this was applied and compared to the archetype hg19 reference, assembly of local Emirati genomes was reduced by āˆ¼19% (i.e., some 1 million fewer calls). In compiling the United Arab Emirates Reference Genome (UAERG), sets of annotated 23,038,090 short (novel: 1,790,171) and 137,713 structural (novel: 8,462) variants; their allele frequencies (AFs) and distribution across the genome were identified. Population-specific genetic characteristics including loss-of-function variants, admixture, and ancestral haplogroup distribution were identified and reported here. We also detect a strong correlation between F and admixture components in the UAE. This baseline study was conceived to establish a high-quality reference genome and a genetic variations resource to enable the development of regional population specific initiatives and thus inform the application of population studies and precision medicine in the UAE. S

    Whole genome sequencing of four representatives from the admixed population of the United Arab Emirates

    Get PDF
    Ā© Copyright Ā© 2020 Daw Elbait, Henschel, Tay and Al Safar. Whole genome sequences (WGS) of four nationals of the United Arab Emirates (UAE) at an average coverage of 33X have been completed and described. The selection of suitable subpopulation representatives was informed by a preceding comprehensive population structure analysis. Representatives were chosen based on their central location within the subpopulation on a principal component analysis (PCA) and the degree to which they were admixed. Novel genomic variations among the different subgroups of the UAE population are reported here. Specifically, the WGS analysis identified 4,161,067ā€“4,798,806 variants in the four individual samples, where approximately 80% were single nucleotide polymorphisms (SNPs) and 20% were insertions or deletions (indels). An average of 2.75% was found to be novel variants according to dbSNP (build 151). This is the first report of structural variants (SV) from WGS data from UAE nationals. There were 15,677ā€“20,339 called SVs, of which around 13.5% were novel. The four samples shared 1,399,178 variants, each with distinct variants as follows: 1,085,524 (for the individual denoted as UAE S011), 1,228,559 (UAE S012), 791,072 (UAE S013), and 906,818 (UAE S014). These results show a previously unappreciated population diversity in the region. The synergy of WGS and genotype array data was demonstrated through variant annotation of the former using 2.3 million allele frequencies for the local population derived from the latter technology platform. This novel approach of combining breadth and depth of array and WGS technologies has guided the choice of population genetic representatives and provides complementary, regionalized allele frequency annotation to new genomes comprising millions of loci
    • ā€¦
    corecore